

# ORIGINAL

AF\$ EFW

HEWLETT-PACKARD COMPANY Intellectual Property Administration P.O. Box 272400 Fort Collins, Colorado 80527-2400 PATENT APPLICATION

| ATTORNEY | DOCKET NO | • | 200301732-1 |
|----------|-----------|---|-------------|
|          |           |   |             |

2192

## **IN THE**

#### UNITED STATES PATENT AND TRADEMARK OFFICE

Inventor(s): Steven O. HOBBS et al.

Confirmation No.: 4992

Group Art Unit:

Application No.: 09/785,143

Examiner: A. R. Fowlkes

Filing Date: 02/16/2001

Title: METHOD AND APPARATUS FOR REDUCING CACHE THRASHING

Mail Stop Appeal Brief-Patents Commissioner For Patents PO Box 1450 Alexandria VA 22313-1450

| Alexandria, VA 22313-1450                                                                                                                                                                                                                                                                                                                   |                                                                                            |                                                                                                                                       |                                                                                         |                                                                   |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-------------------------------------------------------------------|--|--|
|                                                                                                                                                                                                                                                                                                                                             | TRANSMITTAL                                                                                | OF APPEAL BRIEF                                                                                                                       |                                                                                         |                                                                   |  |  |
| Fransmitted herewith is the Appeal Brief in                                                                                                                                                                                                                                                                                                 | n this application w                                                                       | vith respect to the Notice of A                                                                                                       | Appeal filed on                                                                         | 09/29/2005                                                        |  |  |
| The fee for filing this Appeal Brief is (37 C                                                                                                                                                                                                                                                                                               | FR 1.17(c)) \$500.0                                                                        | 00.                                                                                                                                   |                                                                                         |                                                                   |  |  |
|                                                                                                                                                                                                                                                                                                                                             | (complete (a) o                                                                            | r (b) as applicable)                                                                                                                  |                                                                                         |                                                                   |  |  |
| `<br>Γhe proceedings herein are for a patent a                                                                                                                                                                                                                                                                                              | pplication and the                                                                         | provisions of 37 CFR 1.136(                                                                                                           | a) apply.                                                                               |                                                                   |  |  |
| (a) Applicant petitions for an extensio months checked below:                                                                                                                                                                                                                                                                               | n of time under 37                                                                         | 7 CFR 1.136 (fees: 37 CFR                                                                                                             | 1.17(a)-(d)) for the                                                                    | e total number of                                                 |  |  |
| 1st Month \$120                                                                                                                                                                                                                                                                                                                             | 2nd Month<br>\$450                                                                         | 3rd Month \$1020                                                                                                                      | 4th Mor<br>\$1590                                                                       |                                                                   |  |  |
| ☐ The extension fee has already been  ☐ The extension fee has already been  ☐ (b) Applicant believes that no extension the possibility that applicant has inal processes charge to Deposit Account 08-202 blease charge any fees required or created the control of the code of Federal sections in Title 37 of the Code of Federal charge. | n of time is require dvertently overloo  25 the sum of edit any over pay eposit Account 08 | d. However, this conditional ked the need for a petition at 5 500 . At any time dument to Deposit Account (-2025 under 37 CFR 1.16 tl | nd fee for extension<br>ring the pendency of<br>08-2025 pursuant<br>hrough 1.21 inclusi | of time.  of this application, to 37 CFR 1.25.  ve, and any other |  |  |
| I hereby certify that this correspon<br>deposited with the United States Posta<br>class mail in an envelope addressed to:<br>Commissioner for Patents, Alexandria,<br>Date of Deposit: 11/28/2005                                                                                                                                           | dence is being<br>al Service as first                                                      | Respectfully submit                                                                                                                   | Respectfully submitted, Steven O. HOBBS et al.                                          |                                                                   |  |  |
| OR                                                                                                                                                                                                                                                                                                                                          |                                                                                            | Alan D. Christens                                                                                                                     | son                                                                                     |                                                                   |  |  |
| I hereby certify that this paper is bein<br>the Patent and Trademark Office fa<br>(571)273-8300.                                                                                                                                                                                                                                            |                                                                                            | Attorney/Agent fo                                                                                                                     | Attorney/Agent for Applicant(s)  Reg No.: 54,036                                        |                                                                   |  |  |
| Date of facsimile:                                                                                                                                                                                                                                                                                                                          |                                                                                            | Date :                                                                                                                                | 11/28/2005                                                                              |                                                                   |  |  |
| Typed Name: Colleen F. Brown Signature: Lalle 7.                                                                                                                                                                                                                                                                                            | Brein                                                                                      | Telephone :                                                                                                                           | (713) 238-8000                                                                          |                                                                   |  |  |



## IN THE UNITED STATES PATENT AND TRADEMARK OFFICE

Appellants: Steven O. HOBBS et al. Confirmation No.: 4992

Serial No.: 09/785,143 **Group Art Unit:** 2192

Filed: 02/16/2001 Examiner: Andre R. Fowlkes

For: Method And Apparatus Docket No.: 200301732-1 For Reducing Cache

Thrashing

## APPEAL BRIEF

Mail Stop Appeal Brief - Patents Commissioner for Patents PO Box 1450 Alexandria, VA 22313-1450

Sir:

Appellants hereby submit this Appeal Brief in connection with the aboveidentified application. A Notice of Appeal was filed via facsimile on September 29, 2005.

12/02/2005 DEMMANU1 00000065 082025 09785143

01 FC:1402

500.00 DA

Date: November 28, 2005

## **TABLE OF CONTENTS**

| I.    | REA                                           | L PARTY IN INTEREST                                                                       | 3  |  |  |  |
|-------|-----------------------------------------------|-------------------------------------------------------------------------------------------|----|--|--|--|
| II.   | RELATED APPEALS AND INTERFERENCES             |                                                                                           |    |  |  |  |
| III.  | STATUS OF THE CLAIMS                          |                                                                                           |    |  |  |  |
| IV.   | STATUS OF THE AMENDMENTS                      |                                                                                           |    |  |  |  |
| V.    | SUMMARY OF THE CLAIMED SUBJECT MATTER         |                                                                                           |    |  |  |  |
| VI.   | GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL |                                                                                           |    |  |  |  |
| VII.  | ARGUMENT                                      |                                                                                           |    |  |  |  |
|       | A.                                            | Overview of Carr                                                                          | 19 |  |  |  |
|       | B.                                            | Overview of McGehearty                                                                    | 20 |  |  |  |
|       | C.                                            | Did amendments to claims 1, 9, 17, 24, 26 (new claim 41), 33 and 35 introduce new matter? | 21 |  |  |  |
|       | D.                                            | Claims 1-8, with claim 1 representing this group                                          | 21 |  |  |  |
|       | E.                                            | Claims 9-16, with claim 9 representing this group                                         | 23 |  |  |  |
|       | F.                                            | Claims 17-23, with claim 17 representing this group                                       | 24 |  |  |  |
|       | G.                                            | Claims 24, 35-37 and 39-40, with claim 35 representing this group                         | 25 |  |  |  |
|       | Н.                                            | Claims 33 and 34, with claim 33 representing this group                                   | 26 |  |  |  |
|       | 1.                                            | Claims 41-47, with claim 41 representing this group                                       | 27 |  |  |  |
| VIII. | CON                                           | ICLUSION                                                                                  | 28 |  |  |  |
| IX.   | CLA                                           | IMS APPENDIX                                                                              | 29 |  |  |  |
| Χ.    | EVIDENCE APPENDIX                             |                                                                                           |    |  |  |  |
| XI.   | RELATED PROCEEDINGS APPENDIX5                 |                                                                                           |    |  |  |  |
|       |                                               |                                                                                           |    |  |  |  |

## I. REAL PARTY IN INTEREST

The real party in interest is the Hewlett-Packard Development Company (HPDC), a Texas Limited Partnership, having its principal place of business in Houston, Texas. HPDC is a wholly owned affiliate of Hewlett-Packard Company (HPC). HPC merged with Compaq Computer Corporation (CCC) which owned Compaq Information Technologies Group, L.P. (CITG). The Assignment from the inventors to CCC was recorded on February 16, 2001, at Reel/Frame 011614/0531. The Assignment from CCC to CITG was recorded on June 7, 2005, at Reel/Frame 016306/0921. The Change of Name document was recorded on June 8, 2005, at Reel/Frame 016313/0854.

## II. RELATED APPEALS AND INTERFERENCES

Appellants are unaware of any related appeals or interferences.

#### III. STATUS OF THE CLAIMS

In the final Office action dated July 27, 2005, the Examiner rejected claims 1-40. Appellants submitted a response to the final Office action on September 27, 2005, in which Appellants unintentionally canceled claims 26-32. On November 17, 2005, Appellants submitted an Amendment after Notice of Appeal to cancel claims 25 and 38 and to insert the content of the previously canceled claims 26-32 into new claims 41-47. Appellants are appealing the rejection of claims 1-24 and 33-47.

#### IV. STATUS OF THE AMENDMENTS

A Response to Final Office Action dated July 27, 2005, was filed with the U.S. Patent and Trademark Office on September 27, 2005. In the Response, claims 26-32 were unintentionally canceled. On November 17, 2005, Appellants submitted an Amendment after Notice of Appeal to cancel claims 25 and 38 and to insert the content of the previously canceled claims 26-32 into new claims 41-47. The amendment was not a substantive amendment, does not require further searching, and merely puts the remaining claims in better condition for appeal. Appellants assume these amendments have been or will be entered. The Claims Appendix includes two sets of claims: 1) a copy of the claims to be considered for appeal (claims 1-24 and 33-47); and 2) a copy of the claims rejected by the Examiner in the final Office action dated July 27, 2005 (claims 1-40).

#### V. SUMMARY OF THE CLAIMED SUBJECT MATTER

The following provides a concise explanation of the subject matter defined in each of the claims involved in the appeal, referring to the specification by page and line number and to the drawings by reference characters, as required by 37 C.F.R. § 41.37(c)(1)(v). Each element of the claims is identified by a corresponding reference to the specification and drawings where applicable (*i.e.*, the claims are annotated using parenthesis). Note that the citation to passages in the specification and drawings for each claim element does not imply that the limitations from the specification and drawings should be read into the corresponding claim element.

1. (Previously presented) A method, comprising:

identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25);

identifying each vector memory reference in the loop (see Fig. 4; p. 13, lines 5-7; and p. 13, lines 16-19);

determining dependencies between vector memory references in the loop, including determining unidirectional and circular dependencies (see Fig. 4 and p. 13, line 19 – p. 14, line 2);

distributing the vector memory references into a plurality of detail loops configured to allocate the vector memory references into a plurality of temporary arrays, sized and located, so that none of the vector memory references are cache synonyms, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references (see Fig 4 and p. 14, line 21 – p. 15, line 24);

analyzing an execution profile of the program after said distributing (see Fig. 2 and p. 8, line 11 – p. 9, line 7); and

based on the execution profile, determining whether to repeat said identifying a loop, said identifying each vector memory reference,

said determining dependencies, and said distributing (see Fig. 2 and p. 8, line 11 - p. 9, line 7).

- 2. (Original) A method, as set forth in claim 1, further comprising allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas (p. 15, lines 12-24).
- 3. (Original) A method, as set forth in claim 1, further comprising at least one section loop including the plurality of detail loops (p. 15, lines 4-5).
- 4. (Original) A method, as set forth in claim 1, wherein distributing the vector memory references into a plurality of detail loops further comprises distributing the vector memory references into a plurality of detail loops that each contain at least one vector memory reference that could benefit from cache management (see 320 of Fig. 4 and p. 14, lines 4-14).
- 5. (Original) A method, as set forth in claim 1, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, lines 6-9 and p. 20, lines 27-37).
- 6. (Original) A method, as set forth in claim 1, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, lines 9-25).
- 7. (Original) A method, as set forth in claim 1, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 19, line 1 p. 20, line 37).

- 8. (Original) A method, as set forth in claim 1, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, line 7 p. 20, line 37).
- (Previously presented) A method, comprising:
   identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25);
   identifying each vector memory reference in the loop (see Fig. 4; p. 13, lines 5-7; and p. 13, lines 16-19);
  - determining dependencies between vector memory references in the loop (see Fig. 4 and p. 13, line 19 p. 14, line 2); and
  - distributing the vector memory references into a plurality of detail loops that serially proceed through strips of the vector memory references and store the strips in temporary arrays so that none of the vector memory references are cache synonyms, wherein the vector memory references that have dependencies therebetween are included in a common detail loop (see Fig 4 and p. 14, line 21 p. 15, line 24);
  - wherein said distributing the vector memory references into a plurality of detail loops is performed by a first computer for execution by a second computer (p. 7, lines 19-22).
- 10. (Original) A method, as set forth in claim 9, further comprising allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas (p. 15, lines 12-24).

- 11. (Original) A method, as set forth in claim 9, further comprising at least one section loop including the plurality of detail loops (p. 15, lines 4-5).
- 12. (Original) A method, as set forth in claim 9, wherein distributing the vector memory references into a plurality of detail loops further comprises distributing the vector memory references into a plurality of detail loops that each contain at least one vector memory reference that could benefit from cache management (see 320 of Fig. 4 and p. 14, lines 4-14).
- 13. (Original) A method, as set forth in claim 9, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, lines 6-9 and p. 20, lines 27-37).
- 14. (Original) A method, as set forth in claim 9, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, lines 9-25).
- 15. (Original) A method, as set forth in claim 9, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 19, line 1 p. 20, line 37).
- 16. (Original) A method, as set forth in claim 9, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, line 7 p. 20, line 37).

- 17. (Previously presented) A method, comprising: identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25); identifying each vector memory reference in the loop (see Fig. 4; p. 13, lines 5-7; and p. 13, lines 16-19);
  - determining dependencies between vector memory references in the loop (see Fig. 4 and p. 13, line 19 p. 14, line 2);
  - distributing the vector memory references into a plurality of detail loops in response to cache behavior and the dependencies between the vector memory references in the loop, wherein the detail loops cause storage of the vector memory references in temporary arrays that are allocated consecutively so that no temporary arrays elements are cache synonyms (see Fig 4 and p. 14, line 21 p. 15, line 24),
  - wherein said identifying a loop, said identifying each vector memory reference, said determining dependencies between vector memory references and said distributing the vector memory references into a plurality of detail loops produce code that is substantially independent of a computer architecture (p. 8, lines 1-7); and
  - performing code optimizations that are dependent on a computer architecture after said distributing (p. 8, lines 18-22).
- 18. (Original) A method, as set forth in claim 17, wherein distributing the vector memory references further comprises distributing the vector memory references into the plurality of detail loops with each loop having at least one of the identified vector memory references (see Fig 4 and p. 14, line 21 p. 15, line 24).
- 19. (Original) A method, as set forth in claim 17, further comprising determining dependencies between vector memory references in the loop, and wherein distributing the loop includes distributing the vector memory references into the plurality of detail loops, wherein the vector memory references that have

circular dependencies therebetween are included in a common detail loop (p. 13, lines 16-21).

- 20. (Original) A method, as set forth in claim 17, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, lines 6-9 and p. 20, lines 27-37).
- 21. (Original) A method, as set forth in claim 17, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, lines 9-25).
- 22. (Original) A method, as set forth in claim 17, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 19, line 1 p. 20, line 37).
- 23. (Original) A method, as set forth in claim 17, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, line 7 p. 20, line 37).
- 24. (Previously presented) A computer programmed to perform a method, comprising:

identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25);

identifying each vector memory reference in the loop (see Fig. 4; p. 13, lines 5-7; and p. 13, lines 16-19);

- determining dependencies between vector memory references in the loop (see Fig. 4 and p. 13, line 19 p. 14, line 2); and
- distributing the vector memory references into a plurality of detail loops configured to retrieve strips of the vector memory references and store the strips in temporary arrays, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop (see Fig 4 and p. 14, line 21 p. 15, line 24),
- wherein the temporary arrays are configured to simultaneously fit in a single cache bank (p. 15, lines 16-20 and p. 17, line 11-15).
- 33. (Previously presented) A method for reducing the likelihood of cache thrashing by software to be executed on a computer system (100) having a cache (210, 220), comprising:
- executing the software on the computer system (100) (Fig. 1 and p. 7, lines 6-13);
  - generating a profile indicating the manner in which the software uses the cache (210, 220) (Fig. 2 and p. 8, lines 11-22);
  - identifying a portion of the software that exhibits cache thrashing based on the profile data (p. 8, lines 11-22 and p. 12, line 23 p. 13, line 3); and
  - modifying the identified portion of the software to reduce the likelihood of cache thrashing by distributing cache synonyms into detail loops configured to allocate the cache synonyms into temporary storage areas, sized and located, to prevent cache thrashing (p. 14, line 21–p. 15, line 24),
  - wherein said modifying occurs before optimizations that are based on an architecture of the computer system (100) (p. 8, lines 14-22).

34. (Previously presented) A method, as set forth in claim 33, wherein modifying the identified portion of the software to reduce the likelihood of cache thrashing further comprises:

identifying a loop in the identified portion of the software (see Fig. 4 and p. 13, lines 23-25);

identifying each vector memory reference in the identified loop (see Fig. 4; p. 13, lines 5-7; and p. 13, lines 16-19);

- determining dependencies between the vector memory references in the identified loop of the software, including determining unidirectional and circular dependencies (see Fig. 4 and p. 13, line 19 p. 14, line 2); and
- reducing cache thrashing by distributing the vector memory references into a plurality of detail loops, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references (see Fig 4 and p. 14, line 21 p. 15, line 24).
- 35. (Previously presented) A method for reducing the likelihood of cache thrashing by software to be executed on a computer system (100) having a cache (210, 220), comprising:

executing the software on the computer system (100) (Fig. 1 and p. 7, lines 6-13);

- generating a profile indicating the manner in which the memory references of the software use the cache (210, 220) (Fig. 2 and p. 8, lines 11-22);
- identifying a portion of the memory references based on the profile, wherein the portion of the memory references is determined to cause cache thrashing (p. 8, lines 11-22 and p. 12, line 23 p. 13, line 3); and

- reducing cache thrashing by distributing the portion of the memory references into distinct loops that allocate strips of the memory references into temporary arrays for execution (p. 14, line 21 p. 15, line 24),
- wherein the temporary arrays are configured to simultaneously fit in a single cache bank (p. 15, lines 16-20 and p. 17, line 11-15).
- 36. (Previously presented) The computer of claim 24 wherein the temporary arrays are allocated consecutively such that no temporary array elements are cache synonyms (p. 17, lines 10-15).
- 37. (Previously presented) The computer of claim 24 wherein the details loops are allocated into section loops that cause iterative execution of the detail loops based on a size of the strips (p. 17, lines 17-27).
- 39. (Previously presented) The method of claim 35 wherein the temporary arrays are located and sized to reduce cache thrashing (p. 15, lines 12-24).
- 40. (Previously presented) The method of claim 35 wherein the temporary arrays are allocated consecutively and iteratively executed by a size of the strips (p. 17, lines 10-27).
- 41. (Previously presented) A compiler, comprising:
  - means for identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25);
  - means for identifying each vector memory reference in the loop (see Fig. 4; p. 13, lines 5-7; and p. 13, lines 16-19);
  - means for determining dependencies between vector memory references in the loop, including determining unidirectional and circular dependencies (see Fig. 4 and p. 13, line 19 p. 14, line 2);

- means for distributing the vector memory references into a plurality of detail loops configured to serially process strips of the vector memory references so that thrashing does not occur, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references (see Fig 4 and p. 14, line 21 p. 15, line 24);
- means for determining an execution profile of the program after said distributing occurs (p. 8, lines 1-9); and
- means for selectively repeating use of said means for identifying a loop, said means for identifying each vector memory reference, said means for determining dependencies, said means for distributing the vector memory references into a plurality of detail loops, and said means for determining an execution profile based on said execution profile (p. 8, line 11, p. 9, line 7).
- 42. (Original) A compiler, as set forth in claim 41, further comprising means for allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas (p. 15, lines 12-24).
- 43. (Original) A compiler, as set forth in claim 41, wherein the means for distributing the vector memory references into a plurality of detail loops further comprises distributing the vector memory references into a plurality of detail loops that each contain at least one vector memory reference that could benefit from cache management (see 320 of Fig. 4 and p. 14, lines 4-14).
- 44. (Original) A compiler, as set forth in claim 41, further comprising inserting cache management instructions into at least one of said detail loops to control

movement of data associated with the vector memory reference between a cache and main memory (p. 18, lines 6-9 and p. 20, lines 27-37).

- 45. (Original) A compiler, as set forth in claim 41, further comprising means for inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, lines 9-25).
- 46. (Original) A compiler, as set forth in claim 41, further comprising means for performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 19, line 1 p. 20, line 37).
- 47. (Original) A compiler, as set forth in claim 41, further comprising means for inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory (p. 18, line 7 p. 20, line 37).

#### VI. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL

Did amendments to claims 1, 9, 17, 24, 26 (new claim 41), 33 and 35 introduce new matter?

Whether claims 1-8 are unpatentable over "Compiler Optimizations for Improving Data Locality" by Steve Carr et at. (hereinafter referred to as the "Carr" reference) in view of U.S. Pat. No. 6,029,225 ("McGehearty").

Whether claims 9-16 are unpatentable over Carr in view of McGehearty.

Whether claims 17-23 are unpatentable over Carr in view of McGehearty.

Whether claims 24, 35-37 and 39-40 are unpatentable over Carr in view of McGehearty.

Whether claims 33-34 are unpatentable over Carr in view of McGehearty.

Whether claims 41-47 are unpatentable over Carr in view of McGehearty.

#### VII. ARGUMENT

The claims do not stand or fall together. Instead, Appellants present separate arguments for various independent and dependent claims. Each of these arguments is separately argued below and presented with separate headings and sub-headings as required by 37 C.F.R. § 41.37 (c)(1)(vii).

#### A. Overview of Carr

Carr teaches a compiler that optimizes data locality based on a model that computes both temporal and spatial reuse of cache lines to find desirable loop organizations. In describing the compiler, Carr states:

The cost model drives the application of compound transformations consisting of loop transformation, loop fusion, loop distribution, and loop reversal.

Page 252, Col. L, lines 7-10

By driving heuristics with a cache model, our algorithms are efficient and usually find the best transformations for data locality using permutation, fusion and distribution.

Page 253, Col. L, lines 47-50

Loop distribution separates independent statements in a single loop into multiple loops with identical headers. ...In our system we only use loop distribution to indirectly improve reuse by enabling loop permutation on a nest that is not permutable. ...We only call *Distribute* if memory order cannot be achieved on a nest and not all the inner nests can be fused. ...We thus perform distribution only if it combines with permutation to improve the actual LoopCost.

Page 256, Col. L, line 48 - Col. R, line 33

Distribution could also be effective if there is no temporal locality between partitions and the accessed arrays are too numerous to fit in cache at once, or register pressure is a concern. We do not address these issues here.

Page 256, Footnote 3

Based on the above, Carr suggests that distribution could be effective if accessed arrays are too numerous to fit in cache at once, but states "we do not address these concerns here." Accordingly, while Carr teaches a compiler that uses loop distribution in some circumstances, Carr does not address using distribution when accessed arrays are too numerous to fit in cache at once. In

contrast, Appellants' claimed invention uses distribution when accessed arrays are too numerous to fit in cache at once (causing cache thrashing).

## B. Overview of McGehearty

McGehearty teaches a cache collision avoidance mechanism. In describing the mechanism, McGehearty states:

[i]f the source and destination addresses have different cache addresses then the data flow into and out of the cache operates in an enhanced mode. If the source and destination addresses have nearly matching cache addresses then the data flow into and out of the cache operates in a near cache collision mode. If the source and destination addresses have exactly matching addresses then the data flow into and out of the cache operates in an exact cache collision avoidance mode.

### Col. 2, lines 19-27

The enhanced mode of cache bank conflict avoidance carefully orders loads and stores so that loads to one memory bank are performed on the same clock cycles as the stores to the other memory bank. After a group of loads and stores are completed, then operations for each bank are switched so that a group of stores are performed on one bank and a group of loads are performed to the other bank.

### Col. 2. lines 27-39

The near cache collision avoidance mode determines the relative locations of the source and destination addresses within the cache. If the source address is slightly after the destination, then the enhanced mode will function properly. If the destination address is slightly after the source address, the groups of cache lines is loaded into registers, and then the registers are stored to memory without any interleaving of other loads and stores.

#### Col. 2, lines 40-47

The exact cache collision avoidance mode restructures the loop of moving data to form a series of loads to get several cache lines staged for loading, each element of data is not only moved into cache, but into registers. This is pipelined so that after this initial set of loads is performed, then additional loads are interleaved with non-cache conflicting stores to move new values into memory. By separating the time of loads and stores for matching cache lines, full pipelining and multi-entry cache benefits is obtained.

#### Col. 2, lines 48-57

While McGehearty does teach a mechanism for cache collision avoidance, the mechanism is not based on "detail loops" as required in Appellants' claimed invention. Instead, McGehearty implements a cache manager 19 that monitors and controls the flow of data into and out of the cache 10. When data is being loaded into the cache 10 from memory 15, the cache manager 19 compares (by subtraction) the memory source address and the memory destination address to determine whether to implement the enhanced mode, the near cache collision mode or the exact cache collision mode. In contrast, Appellants' claimed invention reduces cache thrashing (collisions) by distributing vector memory references into detail loops.

## C. Did amendments to claims 1, 9, 17, 24, 26 (new claim 41), 33 and 35 introduce new matter?

As shown in the "SUMMARY OF THE CLAIMED SUBJECT MATTER" section, the limitations of claims 1, 9, 17, 24, 26 (new claim 41), 33 and 35 are supported by Appellants' specification. Accordingly, the Examiner improperly objected to amendments to claims 1, 9, 17, 24, 26 (new claim 41), 33 and 35 as introducing new matter under 35 U.S.C § 132 and 37 C.F.R. § 1.121(f).

#### D. Claims 1-8, with claim 1 representing this group

Claim 1, in part, requires "determining dependencies between vector memory references in [a] loop, including determining unidirectional and circular dependencies." Claim 1 further requires "distributing the vector memory references into a plurality of detail loops configured to allocate the vector memory references into a plurality of temporary arrays, sized and located, so that none of the vector memory references are cache synonyms." Claim 1 further requires "analyzing an execution profile of the program after said distributing" and "based on the execution profile, determining whether to repeat...said determining dependencies, and said distributing."

The Examiner's obviousness rejection of claim 1 in view of Carr and McGehearty is improper for at least three reasons. First, the Examiner admits that Carr does not disclose Appellants' claimed "allocat[ing] the vector memory references into a plurality of temporary arrays, sized and located, so that none of

the vector memory references are cache synonyms," and relies on McGehearty as teaching this limitation (see final Office action, page 4, third paragraph). However, McGehearty does not appear to teach or suggest "detail loops" are used to "allocate the vector memory references into a plurality of temporary arrays" as required in claim 1. Instead, McGehearty teaches a cache manager 19 that monitors and controls the flow of data into and out of the cache 11. "To avoid cache collisions that occur when there is an exact mapping of source and destination addresses, the manager 19 loads the data from the memory that would fill multiple cache lines into different registers, via the cache, to start a pipelined copy operation" (See Fig. 1 and Col. 6, lines 23-27). Appellants cannot find a teaching or suggestion in McGehearty that "detail loops... allocate the vector memory references into a plurality of temporary arrays" as required in claim 1.

Second, there is no clear motivation to combine Carr with McGehearty "to avoid cache collisions" as suggested by the Examiner. As previously discussed, Carr optimizes data locality based on a cost model. "The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations" (Page 252, Col. L, lines 7-8). In contrast, McGehearty's cache collision avoidance mode restructures loops when a cache manager 19 determines that data to be stored in a cache 11 will cause cache collisions (Col. 3, lines 50-55 and col. 6, lines 23-32). Because Carr's algorithm is intended to optimize data locality and loop organizations, one of ordinary skill in the art would not think to combine with Carr with McGehearty's loop restructuring mechanism would not be needed if loops have been organized to optimize data locality according to Carr. Likewise, if McGehearty's loop restructuring mechanism is implemented, there is no apparent need to optimize data locality and loop organization according to Carr.

Third, none of the references cited by the Examiner, nor combinations of the references, teaches or suggests "analyzing an execution profile of the program after said distributing" and "based on the execution profile, determining whether to repeat...said determining dependencies, and said distributing" as required in claim 1. On the contrary, Carr states "our approach only performs one evaluation step because it directly determines the best loop permutation" (Page 253, Col. L, lines 22-24). For at least these reasons, Appellants submit that the Examiner improperly rejected claim 1 and that claims 1-8 should be allowed.

## E. Claims 9-16, with claim 9 representing this group

Claim 9, in part, requires "distributing the vector memory references into a plurality of detail loops that serially proceed through strips of the vector memory references and store the strips in temporary arrays so that none of the vector memory references are cache synonyms". Claim 9 further requires "said distributing the vector memory references into a plurality of detail loops is performed by a first computer for execution by a second computer."

The Examiner's obviousness rejection of claim 9 in view of Carr and McGehearty is improper for at least three reasons. First, the Examiner admits that Carr does not disclose Appellants' claimed "a plurality of detail loops that serially proceed through strips of vector memory references and store the strips in temporary arrays so that none of the vector memory references are cache synonyms" and relies on McGehearty as teaching this limitation (see Final Office Action, page 8, first and second paragraphs). However, McGehearty does not appear to teach or suggest "detail loops that serially proceed through strips of vector memory references and store the strips in temporary arrays so that none of the vector memory references are cache synonyms" as required in claim 9. Instead, McGehearty teaches "[t]o avoid cache collisions that occur when there is an exact mapping of source and destination addresses, the [cache] manager 19 loads the data from the memory that would fill multiple cache lines into different registers, via the cache, to start a pipelined copy operation" (See Fig. 1 and Col. lines 23-27). Thus, McGehearty appears to teach a cache manager 19 rather than Appellants' claimed "detail loops" is responsible for cache collision avoidance.

Second, as previously discussed with respect to claim 1, there is no clear motivation to combine Carr with McGehearty "to avoid cache collisions" as suggested by the Examiner. Third, none of the references cited by the Examiner,

nor combinations of the references, appear to teach or suggest "said distributing the vector memory references into a plurality of detail loops is performed by a first computer for execution by a second computer" as required in claim 9. For at least these reasons, Appellants submit that the Examiner improperly rejected claim 9 and that claims 9-16 should be allowed.

## F. Claims 17-23, with claim 17 representing this group

Claim 17, in part, requires "distributing the vector memory references into a plurality of detail loops in response to cache behavior and the dependencies between the vector memory references in the loop, wherein the detail loops cause storage of the vector memory references in temporary arrays that are allocated consecutively so that no temporary [array] elements are cache synonyms." Claim 17 further requires "said distributing the vector memory references into a plurality of detail loops produce[s] code that is substantially independent of a computer architecture" and "performing code optimizations that are dependent on a computer architecture after said distributing."

The Examiner's obviousness rejection of claim 17 in view of Carr and McGehearty is improper for at least three reasons. First, the Examiner recognizes that Carr does not teach Appellants' claimed "storage of the vector memory references in temporary arrays that are allocated consecutively so that no temporary [array] elements are cache synonyms" and relies on McGehearty as teaching this limitation (see final Office action, page 10, first and second paragraphs). However, McGehearty does not appear to teach or suggest "detail loops [that] cause storage of the vector memory references in temporary arrays that are allocated consecutively so that no temporary [array] elements are cache synonyms" as required in claim 17. Instead of Appellants' claimed "detail loops," McGehearty relies on a cache manager 19 that loads data from the memory that would fill multiple cache lines into different registers, via the cache, to start a pipelined copy operation.

Second, as previously discussed with respect to claim 1, there is no clear motivation to combine Carr with McGehearty "to avoid cache collisions" as suggested by the Examiner. Third, none of the references cited by the Examiner,

nor combinations of the references, appear to teach or suggest "said distributing the vector memory references into a plurality of detail loops produce[s] code that is substantially independent of a computer architecture" and "performing code optimizations that are dependent on a computer architecture after said distributing." For at least these reasons, Appellants submit that the Examiner improperly rejected claim 17 and that claims 17-23 should be allowed.

## G. Claims 24, 35-37 and 39-40, with claim 35 representing this group

Claim 35, in part, requires "identifying a portion of the memory references based on the profile, wherein the portion of the memory references is determined to cause cache thrashing." Claim 35 further requires "reducing cache thrashing by distributing the portion of the memory references into distinct loops that allocate strips of the memory references into temporary arrays for execution." Claim 35 further required "the temporary arrays are configured to simultaneously fit in a single cache bank."

The Examiner's obviousness rejection of claim 35 in view of Carr and McGehearty is improper for at least three reasons. First, none of the references cited by the Examiner, nor combinations of the references, teaches or suggests "identifying a portion of the memory references...determined to cause cache thrashing" and "reducing cache thrashing by distributing the portion of the memory references into distinct loops that allocate strips of the memory references into temporary arrays for execution" as required in claim 35. While McGehearty teaches a cache collision avoidance mode that "restructures [a] loop of moving data," McGehearty does not teach or suggest Appellants' claimed "distributing the portion of the memory references into distinct loops that allocate strips of the memory references into temporary arrays for execution."

Second, as previously discussed with respect to claim 1, there is no clear motivation to combine Carr with McGehearty "to avoid cache collisions" as suggested by the Examiner. Third, none of the references cited by the Examiner, nor combinations of the references, appear to teach or suggest "the temporary arrays are configured to simultaneously fit in a single cache bank" as required in

claim 35. The Examiner cites Carr as teaching "knowledge of the cache size, associativity, and replacement policy is essential" (see final Office action, page 5, last paragraph). However, the Carr reference still does not teach or suggest "temporary arrays are configured to simultaneously fit in a single cache bank" as required in claim 35. McGehearty is likewise deficient in this regard. For at least these reasons, Appellants submit that the Examiner improperly rejected claim 35 and that claims 24, 35-37 and 39-40 should be allowed.

## H. Claims 33 and 34, with claim 33 representing this group

Claim 33, in part, requires "modifying the identified portion of the software to reduce the likelihood of cache thrashing by distributing cache synonyms into detail loops configured to allocate the cache synonyms into temporary storage areas, sized and located, to prevent cache thrashing." Claim 33 further requires "said modifying occurs before optimizations that are based on an architecture of the computer system."

The Examiner's obviousness rejection of claim 33 in view of Carr and McGehearty is improper for at least three reasons. First, the Examiner admits that Carr does not teach "distributing cache synonyms into detail loops configured to allocate the cache synonyms into temporary storage areas, sized and located, to prevent cache thrashing" and relies on McGehearty as teaching this limitation. However, McGehearty does not appear to teach or suggest "detail loops configured to allocate the cache synonyms into temporary storage areas, sized and located, to prevent cache thrashing" as required in claim 33. Instead of Appellants' claimed "detail loops," McGehearty relies on a cache manager 19 that loads data from the memory that would fill multiple cache lines into different registers, via the cache, to start a pipelined copy operation.

Second, as previously discussed with respect to claim 1, there is no clear motivation to combine Carr with McGehearty "to avoid cache collisions" as suggested by the Examiner. Third, none of the references cited by the Examiner, nor combinations of the references, appear to teach or suggest "said [distributing cache synonyms into detail loops] occurs before optimizations that are based on an architecture of the computer system." For at least these reasons, Appellants

submit that the Examiner improperly rejected claim 33 and that claims 33 and 34 should be allowed.

## I. Claims 41-47, with claim 41 representing this group

Claim 41, in part, requires "means for distributing the vector memory references into a plurality of detail loops configured to serially process strips of the vector memory references so that thrashing does not occur." Claim 41 further requires "means for determining an execution profile of the program after said distributing occurs" and "means for selectively repeating...said means for distributing the vector memory references into a plurality of detail loops."

The Examiner improperly rejected the limitations of claim 41 in view of Carr and McGehearty for at least three reasons. First, the Examiner recognizes that Carr does not teach Appellants' claimed "distributing the vector memory references into a plurality of detail loops configured to serially process strips of the vector memory references so that thrashing does not occur" and relies on McGehearty as teaching this limitation (see final Office action, page 8, first paragraph). However, McGehearty does not appear to teach or suggest "detail loops configured to serially process strips of the vector memory references so that thrashing does not occur" as required in claim 41. Instead of Appellants' claimed "detail loops," McGehearty relies on a cache manager 19 that loads data from the memory that would fill multiple cache lines into different registers, via the cache, to start a pipelined copy operation.

Second, as previously discussed with respect to claim 1, there is no clear motivation to combine Carr with McGehearty "to avoid cache collisions" as suggested by the Examiner. Third, none of the references cited by the Examiner, nor combinations of the references, appear to teach or suggest "selectively repeating...distributing the vector memory references into a plurality of detail loops." On the contrary, Carr states "our approach only performs one evaluation step because it directly determines the best loop permutation" (Page 253, Col. L, lines 22-24). For at least these reasons, Appellants submit that the Examiner improperly rejected the limitations of claim 41 and that claims 41-47 should be allowed.

#### VIII. CONCLUSION

For the reasons stated above, Appellants respectfully submit that the Examiner erred in rejecting all pending claims. It is believed that no extensions of time or fees are required, beyond those that may otherwise be provided for in documents accompanying this paper. However, in the event that additional extensions of time are necessary to allow consideration of this paper, such extensions are hereby petitioned under 37 C.F.R. § 1.136(a), and any fees required (including fees for net addition of claims) are hereby authorized to be charged to Hewlett-Packard Development Company's Deposit Account No. 08-2025.

Respectfully submitted,

Alan D. Christenson PTO Reg. No. 54,036 CONLEY ROSE, P.C.

(713) 238-8000 (Phone) (713) 238-8008 (Fax)

**AGENT FOR APPELLANTS** 

HEWLETT-PACKARD COMPANY Intellectual Property Administration Legal Dept., M/S 35 P.O. Box 272400 Fort Collins, CO 80527-2400

# IX. CLAIMS APPENDIX CLAIM SET 1 (CLAIMS 1-24 and 33-47)

(Previously presented) A method, comprising:
 identifying a loop in a program;
 identifying each vector memory reference in the loop;

determining dependencies between vector memory references in the loop, including determining unidirectional and circular dependencies;

distributing the vector memory references into a plurality of detail loops configured to allocate the vector memory references into a plurality of temporary arrays, sized and located, so that none of the vector memory references are cache synonyms, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references;

analyzing an execution profile of the program after said distributing; and based on the execution profile, determining whether to repeat said identifying a loop, said identifying each vector memory reference, said determining dependencies, and said distributing.

- 2. (Original) A method, as set forth in claim 1, further comprising allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas.
- 3. (Original) A method, as set forth in claim 1, further comprising at least one section loop including the plurality of detail loops.
- 4. (Original) A method, as set forth in claim 1, wherein distributing the vector memory references into a plurality of detail loops further comprises distributing

the vector memory references into a plurality of detail loops that each contain at least one vector memory reference that could benefit from cache management.

- 5. (Original) A method, as set forth in claim 1, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 6. (Original) A method, as set forth in claim 1, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 7. (Original) A method, as set forth in claim 1, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 8. (Original) A method, as set forth in claim 1, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- (Previously presented) A method, comprising:
   identifying a loop in a program;
   identifying each vector memory reference in the loop;
   determining dependencies between vector memory references in the loop; and

distributing the vector memory references into a plurality of detail loops that serially proceed through strips of the vector memory references and store the strips in temporary arrays so that none of the vector memory references are cache synonyms, wherein the vector memory references that have dependencies therebetween are included in a common detail loop;

wherein said distributing the vector memory references into a plurality of detail loops is performed by a first computer for execution by a second computer.

- 10. (Original) A method, as set forth in claim 9, further comprising allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas.
- 11. (Original) A method, as set forth in claim 9, further comprising at least one section loop including the plurality of detail loops.
- 12. (Original) A method, as set forth in claim 9, wherein distributing the vector memory references into a plurality of detail loops further comprises distributing the vector memory references into a plurality of detail loops that each contain at least one vector memory reference that could benefit from cache management.
- 13. (Original) A method, as set forth in claim 9, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 14. (Original) A method, as set forth in claim 9, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of

data associated with the vector memory reference between a cache and main memory.

- 15. (Original) A method, as set forth in claim 9, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 16. (Original) A method, as set forth in claim 9, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 17. (Previously presented) A method, comprising: identifying a loop in a program; identifying each vector memory reference in the loop; determining dependencies between vector memory references in the loop; distributing the vector memory references into a plurality of detail loops in response to cache behavior and the dependencies between the
  - response to cache behavior and the dependencies between the vector memory references in the loop, wherein the detail loops cause storage of the vector memory references in temporary arrays that are allocated consecutively so that no temporary arrays elements are cache synonyms,
  - wherein said identifying a loop, said identifying each vector memory reference, said determining dependencies between vector memory references and said distributing the vector memory references into a plurality of detail loops produce code that is substantially independent of a computer architecture; and

performing code optimizations that are dependent on a computer architecture after said distributing.

- 18. (Original) A method, as set forth in claim 17, wherein distributing the vector memory references further comprises distributing the vector memory references into the plurality of detail loops with each loop having at least one of the identified vector memory references.
- 19. (Original) A method, as set forth in claim 17, further comprising determining dependencies between vector memory references in the loop, and wherein distributing the loop includes distributing the vector memory references into the plurality of detail loops, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop.
- 20. (Original) A method, as set forth in claim 17, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 21. (Original) A method, as set forth in claim 17, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 22. (Original) A method, as set forth in claim 17, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 23. (Original) A method, as set forth in claim 17, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at

least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.

24. (Previously presented) A computer programmed to perform a method, comprising:

identifying a loop in a program;

identifying each vector memory reference in the loop;

determining dependencies between vector memory references in the loop; and

distributing the vector memory references into a plurality of detail loops configured to retrieve strips of the vector memory references and store the strips in temporary arrays, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop,

wherein the temporary arrays are configured to simultaneously fit in a single cache bank.

25-32. (Canceled).

33. (Previously presented) A method for reducing the likelihood of cache thrashing by software to be executed on a computer system having a cache, comprising:

executing the software on the computer system;

generating a profile indicating the manner in which the software uses the cache;

identifying a portion of the software that exhibits cache thrashing based on the profile data; and

modifying the identified portion of the software to reduce the likelihood of cache thrashing by distributing cache synonyms into detail loops configured to allocate the cache synonyms into temporary storage areas, sized and located, to prevent cache thrashing,

wherein said modifying occurs before optimizations that are based on an architecture of the computer system.

34. (Previously presented) A method, as set forth in claim 33, wherein modifying the identified portion of the software to reduce the likelihood of cache thrashing further comprises:

identifying a loop in the identified portion of the software;

identifying each vector memory reference in the identified loop;

determining dependencies between the vector memory references in the identified loop of the software, including determining unidirectional and circular dependencies; and

reducing cache thrashing by distributing the vector memory references into a plurality of detail loops, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references.

35. (Previously presented) A method for reducing the likelihood of cache thrashing by software to be executed on a computer system having a cache, comprising:

executing the software on the computer system;

generating a profile indicating the manner in which the memory references of the software use the cache;

identifying a portion of the memory references based on the profile, wherein the portion of the memory references is determined to cause cache thrashing; and

- reducing cache thrashing by distributing the portion of the memory references into distinct loops that allocate strips of the memory references into temporary arrays for execution,
- wherein the temporary arrays are configured to simultaneously fit in a single cache bank.
- 36. (Previously presented) The computer of claim 24 wherein the temporary arrays are allocated consecutively such that no temporary array elements are cache synonyms.
- 37. (Previously presented) The computer of claim 24 wherein the details loops are allocated into section loops that cause iterative execution of the detail loops based on a size of the strips.
- 38. (Previously presented) The program storage medium of claim 25 wherein the temporary arrays are allocated consecutively such that no temporary array elements are cache synonyms.
- 39. (Previously presented) The method of claim 35 wherein the temporary arrays are located and sized to reduce cache thrashing.
- 40. (Previously presented) The method of claim 35 wherein the temporary arrays are allocated consecutively and iteratively executed by a size of the strips.
- 41. (Previously presented) A compiler, comprising:

  means for identifying a loop in a program;

  means for identifying each vector memory reference in the loop;

  means for determining dependencies between vector memory references

  in the loop, including determining unidirectional and circular

dependencies;

means for distributing the vector memory references into a plurality of detail loops configured to serially process strips of the vector memory references so that thrashing does not occur, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references;

means for determining an execution profile of the program after said distributing occurs; and

means for selectively repeating use of said means for identifying a loop, said means for identifying each vector memory reference, said means for determining dependencies, said means for distributing the vector memory references into a plurality of detail loops, and said means for determining an execution profile based on said execution profile.

- 42. (Original) A compiler, as set forth in claim 41, further comprising means for allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas.
- 43. (Original) A compiler, as set forth in claim 41, wherein the means for distributing the vector memory references into a plurality of detail loops further comprises distributing the vector memory references into a plurality of detail loops that each contain at least one vector memory reference that could benefit from cache management.
- 44. (Original) A compiler, as set forth in claim 41, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.

- 45. (Original) A compiler, as set forth in claim 41, further comprising means for inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 46. (Original) A compiler, as set forth in claim 41, further comprising means for performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 47. (Original) A compiler, as set forth in claim 41, further comprising means for inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.

## **CLAIM SET 2 (CLAIMS 1-40)**

(Previously presented) A method, comprising:
 identifying a loop in a program;
 identifying each vector memory reference in the loop;

determining dependencies between vector memory references in the loop, including determining unidirectional and circular dependencies;

distributing the vector memory references into a plurality of detail loops configured to allocate the vector memory references into a plurality of temporary arrays, sized and located, so that none of the vector memory references are cache synonyms, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references;

analyzing an execution profile of the program after said distributing; and based on the execution profile, determining whether to repeat said identifying a loop, said identifying each vector memory reference, said determining dependencies, and said distributing.

- 2. (Original) A method, as set forth in claim 1, further comprising allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas.
- 3. (Original) A method, as set forth in claim 1, further comprising at least one section loop including the plurality of detail loops.
- 4. (Original) A method, as set forth in claim 1, wherein distributing the vector memory references into a plurality of detail loops further comprises distributing the vector memory references into a plurality of detail loops that each contain at least one vector memory reference that could benefit from cache management.

- 5. (Original) A method, as set forth in claim 1, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 6. (Original) A method, as set forth in claim 1, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 7. (Original) A method, as set forth in claim 1, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 8. (Original) A method, as set forth in claim 1, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 9. (Previously presented) A method, comprising:

identifying a loop in a program;

identifying each vector memory reference in the loop;

determining dependencies between vector memory references in the loop; and

distributing the vector memory references into a plurality of detail loops that serially proceed through strips of the vector memory references and store the strips in temporary arrays so that none of the vector

memory references are cache synonyms, wherein the vector memory references that have dependencies therebetween are included in a common detail loop;

wherein said distributing the vector memory references into a plurality of detail loops is performed by a first computer for execution by a second computer.

- 10. (Original) A method, as set forth in claim 9, further comprising allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas.
- 11. (Original) A method, as set forth in claim 9, further comprising at least one section loop including the plurality of detail loops.
- 12. (Original) A method, as set forth in claim 9, wherein distributing the vector memory references into a plurality of detail loops further comprises distributing the vector memory references into a plurality of detail loops that each contain at least one vector memory reference that could benefit from cache management.
- 13. (Original) A method, as set forth in claim 9, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 14. (Original) A method, as set forth in claim 9, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.

- 15. (Original) A method, as set forth in claim 9, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 16. (Original) A method, as set forth in claim 9, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 17. (Previously presented) A method, comprising: identifying a loop in a program; identifying each vector memory reference in the loop; determining dependencies between vector memory references in the loop; distributing the vector memory references into a plurality of detail loops in response to cache behavior and the dependencies between the vector memory references in the loop, wherein the detail loops cause storage of the vector memory references in temporary arrays that are allocated consecutively so that no temporary arrays elements are cache synonyms,
  - wherein said identifying a loop, said identifying each vector memory reference, said determining dependencies between vector memory references and said distributing the vector memory references into a plurality of detail loops produce code that is substantially independent of a computer architecture; and
  - performing code optimizations that are dependent on a computer architecture after said distributing.

- 18. (Original) A method, as set forth in claim 17, wherein distributing the vector memory references further comprises distributing the vector memory references into the plurality of detail loops with each loop having at least one of the identified vector memory references.
- 19. (Original) A method, as set forth in claim 17, further comprising determining dependencies between vector memory references in the loop, and wherein distributing the loop includes distributing the vector memory references into the plurality of detail loops, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop.
- 20. (Original) A method, as set forth in claim 17, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 21. (Original) A method, as set forth in claim 17, further comprising inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 22. (Original) A method, as set forth in claim 17, further comprising performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 23. (Original) A method, as set forth in claim 17, further comprising inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data

associated with the vector memory reference between a cache and main memory.

24. (Previously presented) A computer programmed to perform a method, comprising:

identifying a loop in a program;

identifying each vector memory reference in the loop;

determining dependencies between vector memory references in the loop; and

distributing the vector memory references into a plurality of detail loops configured to retrieve strips of the vector memory references and store the strips in temporary arrays, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop,

wherein the temporary arrays are configured to simultaneously fit in a single cache bank.

25. (Previously presented) A program storage medium encoded with instructions that, when executed by a computer, perform a method, comprising:

identifying a loop in a program;

identifying each vector memory reference in the loop;

determining dependencies between vector memory references in the loop; and

generating an expanded code of the program by distributing the vector memory references into a plurality of detail loops configured to allocate the vector memory references into temporary arrays that avoid cache synonyms, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop,

wherein the expanded code is substantially independent of computer architectures.

26. (Previously presented) A compiler, comprising:

means for identifying a loop in a program;

means for identifying each vector memory reference in the loop;

means for determining dependencies between vector memory references in the loop, including determining unidirectional and circular dependencies;

means for distributing the vector memory references into a plurality of detail loops configured to serially process strips of the vector memory references so that thrashing does not occur, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references;

means for determining an execution profile of the program after said distributing occurs; and

- means for selectively repeating use of said means for identifying a loop, said means for identifying each vector memory reference, said means for determining dependencies, said means for distributing the vector memory references into a plurality of detail loops, and said means for determining an execution profile based on said execution profile.
- 27. (Original) A compiler, as set forth in claim 26, further comprising means for allocating a plurality of temporary storage areas within a cache and determining the size of each temporary storage area based on the size of the cache and the number of temporary storage areas.
- 28. (Original) A compiler, as set forth in claim 26, wherein the means for distributing the vector memory references into a plurality of detail loops further comprises distributing the vector memory references into a plurality of detail loops

that each contain at least one vector memory reference that could benefit from cache management.

- 29. (Original) A compiler, as set forth in claim 26, further comprising inserting cache management instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 30. (Original) A compiler, as set forth in claim 26, further comprising means for inserting prefetch instructions into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 31. (Original) A compiler, as set forth in claim 26, further comprising means for performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 32. (Original) A compiler, as set forth in claim 26, further comprising means for inserting at least one of a prefetch instruction and a cache management instruction into at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory, and performing loop unrolling on at least one of said detail loops to control movement of data associated with the vector memory reference between a cache and main memory.
- 33. (Previously presented) A method for reducing the likelihood of cache thrashing by software to be executed on a computer system having a cache, comprising:

executing the software on the computer system;

- generating a profile indicating the manner in which the software uses the cache;
- identifying a portion of the software that exhibits cache thrashing based on the profile data; and
- modifying the identified portion of the software to reduce the likelihood of cache thrashing by distributing cache synonyms into detail loops configured to allocate the cache synonyms into temporary storage areas, sized and located, to prevent cache thrashing,
- wherein said modifying occurs before optimizations that are based on an architecture of the computer system.
- 34. (Previously presented) A method, as set forth in claim 33, wherein modifying the identified portion of the software to reduce the likelihood of cache thrashing further comprises:

identifying a loop in the identified portion of the software;

identifying each vector memory reference in the identified loop;

- determining dependencies between the vector memory references in the identified loop of the software, including determining unidirectional and circular dependencies; and
- reducing cache thrashing by distributing the vector memory references into a plurality of detail loops, wherein the vector memory references that have circular dependencies therebetween are included in a common detail loop, and wherein the detail loops are ordered according to the unidirectional dependencies between the memory references.
- 35. (Previously presented) A method for reducing the likelihood of cache thrashing by software to be executed on a computer system having a cache, comprising:

executing the software on the computer system;

- generating a profile indicating the manner in which the memory references of the software use the cache;
- identifying a portion of the memory references based on the profile, wherein the portion of the memory references is determined to cause cache thrashing; and
- reducing cache thrashing by distributing the portion of the memory references into distinct loops that allocate strips of the memory references into temporary arrays for execution,
- wherein the temporary arrays are configured to simultaneously fit in a single cache bank.
- 36. (Previously presented) The computer of claim 24 wherein the temporary arrays are allocated consecutively such that no temporary array elements are cache synonyms.
- 37. (Previously presented) The computer of claim 24 wherein the details loops are allocated into section loops that cause iterative execution of the detail loops based on a size of the strips.
- 38. (Previously presented) The program storage medium of claim 25 wherein the temporary arrays are allocated consecutively such that no temporary array elements are cache synonyms.
- 39. (Previously presented) The method of claim 35 wherein the temporary arrays are located and sized to reduce cache thrashing.
- 40. (Previously presented) The method of claim 35 wherein the temporary arrays are allocated consecutively and iteratively executed by a size of the strips.

X. EVIDENCE APPENDIX

None.

XI. RELATED PROCEEDINGS APPENDIX None.